Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013

نویسندگان

  • Andrés Alfonso Caurcel Díaz
  • José María Gómez Hidalgo
چکیده

Inspired by our ongoing work in the project WENDY, which addresses age detection in social networks by linguistic processing (among other methods), we have built a system that makes use of a number of linguistic resources (a Spanish dictionary, and a SMS-language dictionary) and algorithms (custom text utterances tokenization, SMS to standard Spanish translation, and a number of normalization rules) in order to apply a learning-based approach using a custom Stochastic Gradient Descent algorithm adapted to text, to the Spanish Author Profiling task at PAN’2013. We believe the results obtained in internal testing on a validation set extracted from training dataset do validate our approach in WENDY, while the results obtained in the PAN task are not as good as expected.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

XRCE Personal Language Analytics Engine for Multilingual Author Profiling: Notebook for PAN at CLEF 2015

This technical notebook describes the methodology used – and results achieved – for the PAN 2015 Author Profiling Challenge by the team from Xerox Research Centre Europe (XRCE). This year, personality traits are introduced alongside age and gender in a corpus of tweets in four languages – English, Spanish, Italian and Dutch. We describe a largely language agnostic methodology for classification...

متن کامل

Using Simple Content Features for the Author Profiling Task Notebook for PAN at CLEF 2013

This paper describes the methods we have employed to solve the author profiling task at PAN-2013. Our goal was to use simple features to identify the age group and the gender of the author of a given text. We introduce the features, detail how the classifiers were trained, and how the experiments were run.

متن کامل

Readability for Author Profiling? Notebook for PAN at CLEF 2013

This paper briefly describes the approach taken to the Author Profiling task at PAN 13. It describes the simple features used, and the origins in thinking around text readability as a mechanism for identification, and the predictive model used which may have beneficially omitted classes, as well as offering commentary on the results obtained.

متن کامل

Author Profiling for English and Spanish Text Notebook for PAN at CLEF 2013

This paper describes an approach for the author profiling task of the PAN 2013 challenge. This work is based on the idea of linguistic modality that has been successfully used in other classification tasks such as authorship attribution. We consider three different modalities: syntactic, stylistic, and semantic, each representing a different aspect of text. For each modality, we extract informa...

متن کامل

UniNE at CLEF 2015 Author Profiling: Notebook for PAN at CLEF 2015

This paper describes and evaluates an effective author profiling model called SPATIUM-L1. The suggested strategy can be adapted without any problem to different languages (such as Dutch, English, Italian, and Spanish) in Twitter tweets. As features, we suggest using the 200 most frequent terms of the query text (isolated words and punctuation symbols). Applying a simple distance measure and loo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013